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ABSTRACT 


In  digital  adders,  the  speed  of  addition  is  limited  by  the  time  required  to  transmit  a carry  through  the  adder.  Carry 
Select  Adder  (CSLA)  is  one  of  the  fastest  adders  used  in  many  data-processing  processors  to  perform  fast  arithmetic 
functions.  Now  days  we  use  the  very  High  Speed  IC  Designs  In  that  designs  we  required  very  High  Speed  Structures. 
Parallel  prefix  adder  is  the  most  flexible  and  widely  used  for  binary  addition.  Parallel  Prefix  adders  are  best  suited  for 
VLSI  implementation.  Numbers  of  parallel  prefix  adder  structures  have  been  proposed  over  the  past  years  intended  to 
optimize  area,  fan-out,  logic  depth  and  inter  connect  count.  This  paper  presents  a new  approach  to  redesign  the  basic 
operators  used  in  parallel  prefix  architectures.  Based  on  this  modification  8,  16,  32,  64  and  128-bit  Kogge-Stone  Parallel 
Prefix  Adders  architectures  have  been  developed  and  compared  with  the  regular  and  Modified  CSLA  architecture.  The 
modified  CSLA  design  has  reduced  area  and  Power  as  compared  with  the  regular  CSLA  but  delay  is  increased.  The 
Proposed  Kogge-Stone  adder  has  less  delay  when  compared  with  regular  and  modified  CSLA.  This  work  estimates  the 
performance  of  the  proposed  designs  in  terms  of  delay,  area  are  implemented  in  Xilinx  ISE. 

KEYWORDS:  High  Speed  VLSI,  CSLA,  BEC,  Parallel  Prefix  Adder,  Dot  Operators’  Parallel  Prefix  Adder 

INTRODUCTION 


The  main  area  of  research  in  VLSI  system  design  are  Area  and  power  reduction  in  data  path  logic  systems.  The 
main  fundamental  requirement  of  high-performance  processors  and  systems  is  High-speed  addition  and  multiplication.  Due 
to  the  time  required  to  propagate  a carry  through  the  adder,  the  speed  of  addition  is  limited  in  digital  adders.  In  an 
elementary  adder  the  sum  for  each  bit  position  is  generated  sequentially  only  after  the  previous  bit  position  has  been 
summed  and  a carry  propagated  into  the  next  position.  The  production  of  carries  is  the  major  speed  limitation  in  any  adder. 


The  Carry  Select  Adder  is  used  in  many  computational  systems  to  moderate  the  problem  of  carry  propagation 
delay  by  independently  generating  multiple  carries  and  then  select  a carry  to  generate  the  sum.  However,  the  Carry  Select 
Adder  is  not  area  efficient  because  it  uses  multiple  pairs  of  Ripple  Carry  Adders  (RCA)  to  generate  partial  sum  and  carry 
by  considering  carry  input  and  then  the  final  sum  and  carry  are  selected  by  the  multiplexers  (mux).  To  overcome  the  above 
problem,  n-bit  binary  to  excess- 1 code  converter  (BEC)  is  used  to  improve  the  speed  of  addition.  This  logic  can  be 
implemented  with  any  type  of  adder  to  further  improve  the  speed..  The  main  advantage  of  this  BEC  logic  comes  from  the 
lesser  number  of  logic  gates  than  the  Lull  Adder  (LA)  structure. 


In  electronics,  The  carry-select  adder  is  one  of  the  simple  but  rather  faster,  having  a gate  level  depth  of 
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in  computing 
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The  regular  carry- select  adder  generally  consists  of  two  ripple  carry  adders  and  a multiplexer.  Adding  two  n-bit 
numbers  with  a carry-select  adder  is  done  with  the  two  ripple  carry  adders.  One  ripple  carry  adder  is  used  to  perform  the 
calculation  with  the  assumption  of  the  carry  being  zero  and  the  other  ripple  carry  adder  is  used  to  perform  the  calculation 
assuming  carry  one.  After  the  two  results  are  calculated,  the  correct  sum,  as  well  as  the  correct  carry,  is  then  selected  with 
the  multiplexer  once  the  correct  carry  is  known.  The  number  of  bits  in  each  carry  select  block  can  be  uniform,  or  variable. 

Basic  Building  Block 

The  basic  building  block  of  a carry-select  adder  is  as  shown  in  Figure  1 where  the  block  size  is  4.  Two  4 -bit  ripple 
carry  adders  are  multiplexed  together,  where  the  resulting  carry  and  sum  bits  are  selected  by  the  value  of  carry-in.  One 
ripple  carry  adder  sum  is  considered  when  carry-in  is  0,  and  the  other  sum  is  considered  when  carry -in  is  1. 


Figure  1:  Basic  Carry  Select  Adder 


Uniform-Sized  Adder 

A 16-bit  carry-select  adder  with  a uniform  block  size  of  4 can  be  created  with  three  of  these  blocks  and  a 4 -bit 
ripple  carry  adder.  Since  carry-in  is  known  at  the  beginning  of  computation,  a carry  select  block  is  not  needed  for  the  first 
four  bits.  The  delay  of  this  adder  will  be  four  full  adder  delays,  plus  three  MUX  delays. 


Variable-Sized  Adder 


Figure  2:  Regular  Fixed  Size  CSLA 
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Figure  3:  Variable  Sized  CSLA 

The  above  shown  figure  is  a 16-bit  carry-select  adder  with  variable  size  with  block  sizes  of  2-2-3-4-5.  This  break- 
up is  ideal  when  the  full-adder  delay  is  equal  to  the  MUX  delay,  which  is  unlikely.  The  total  delay  is  two  full  adder  delays, 
and  four  mux  delays. 
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BASIC  STRUCTURE  OF  REGULAR  16-BIT  CSLA 

A 16-bit  carry  select  has  two  types  of  block  size  namely  uniform  block  size  and  variable  block  size.  A 16-bit  carry 
select  adder  with  a uniform  block  size  has  the  delay  of  four  full  adder  delays  and  three  MUX  delays.  While  a 16 -bit  carry 
select  adder  with  variable  block  size  has  the  delay  of  two  full  adder  delays,  and  four  mux  delays.  Therefore  we  use  16 -bit 
carry  select  adder  with  variable  block  size.  Ripple-carry  adders  are  the  simplest  and  most  compact  full  adders,  but  their 
performance  is  limited  by  a carry  that  must  ripple  from  the  least- significant  to  the  most-significant  bit.  A carry-select  adder 
achieves  speeds  40%  to  90%  faster  by  performing  additions  in  parallel  and  reducing  the  maximum  carry  path. 

A carry-select  adder  is  divided  into  sectors,  each  of  which,  except  for  the  least  significant  performs  two  additions 
in  parallel,  one  assuming  a carry-in  of  zero,  the  other  a carry-in  of  one  within  the  sector,  there  are  two  4-bit  ripple-  carry 
adders  receiving  the  same  data  inputs  but  different  Cin.  The  upper  adder  has  a carry-in  of  zero,  the  lower  adder  a carry-in 
of  one.  The  actual  Cin  from  the  preceding  sector  selects  one  of  the  two  adders.  If  the  carry -in  is  zero,  the  sum  and  carry-out 
of  the  upper  adder  are  selected.  If  the  carry-in  is  one,  the  sum  and  carry-out  of  the  lower  adder  are  selected.  Logically,  the 
result  is  not  different  if  a single  ripple-carry  adder  were  used. 

First  the  coding  for  full  adder  and  different  multiplexers  of  6:3,  8:4,  10:5,  and  12:6  was  done.  Then  2,  3,  4,  5 -bit 
ripple  carry  adder  was  done  by  calling  the  full  adder.  The  regular  128 -bit  CSLA  was  created  by  calling  the  ripple  carry 
adders  and  all  multiplexers  based  on  circuit. 


.4(15:11]  BpjJl]  AP0:7]  B[ll:7]  A|M]  B|W|  Apj|  Bfi]  PI  Bp] 


Figure  4:  Regular  16-bit  SQRT  CSLA 
BASIC  STRUCTURE  OF  MODIFIED  16-BIT  CSLA 

It  is  similar  to  regular  16-bit  SQRT  CSLA.  Only  change  is  that  in  basic  blocks  having  two  ripple-carry  adders,  one 
ripple  carry  adder  fed  with  a constant  1 carry-in  is  replaced  by  BEC.  The  area  estimation  of  each  group  is  calculated. 

Based  on  the  consideration  of  delay  values,  the  arrival  time  of  selection  input  Cl  [time  (T)  =7]  of  6:3  mux  is 
earlier  than  the  s3  [t  =9]  and  c3  [t  =7]  and  later  than  the  s2  [t  =4].  Thus,  the  sum3  and  final  c3  (output  from  mux)  are 
depending  on  s3  and  mux  and  partial  c3  (input  to  mux)  and  mux,  respectively.  The  sum2  depends  on  cl  and  mux.  For  the 
remaining  parts  the  arrival  time  of  mux  selection  input  is  always  greater  than  the  arrival  time  of  data  inputs  from  the 
BEC’s.  Thus,  the  delay  of  the  remaining  MUX  depends  on  the  arrival  time  of  mux  selection  input  and  the  mux  delay. 

First  the  coding  for  full  adder  and  multiplexers  of  6:3,  8:4,  10:5,  and  12:6  was  done.  The  BEC  program  was 
design  by  using  NOT,  XOR  and  AND  gates.  Then  2,  3,  4,  5-bit  ripple  carry  adder  was  done  by  calling  the  full  adder.  The 
modified  16-bit  CSLA  was  created  by  calling  the  ripple  carry  adders,  BEC  and  all  multiplexers  based  upon  the  circuit. 
Finally,  modified  128-bit  CSLA  was  created  by  calling  the  ripple  carry  adders,  BEC  and  all  multiplexers  based  on  circuit. 
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The  arrival  time  of  selection  input  of  6:3  mux  is  earlier.  Thus,  the  sum3  and  final  c3  (output  from  mux)  depends 
on  s3  and  mux  and  partial  c3  (input  to  mux)  and  mux,  respectively.  The  sum2  depends  on  eland  mux.  For  the  remaining 
group’s  the  arrival  time  of  mux  selection  input  is  always  greater  than  the  arrival  time  of  data  inputs  from  the  BEC’s.  Thus, 
the  delay  of  the  remaining  groups  depends  on  the  arrival  time  of  mux  selection  input  and  the  mux  delay. 


A|M]  BP]  Api:7]  B(10:7]  A[M]  B(i:4]  Ap]  Bp] 


Figure  5:  Modified  16-bit  SQRT  CSLA 

PROPOSED  METHOD 

The  binary  adder  is  the  critical  element  in  most  digital  circuit  designs  including  digital  signal  processors  (DSP) 
and  microprocessor  data  path  units.  As  such,  extensive  research  continues  to  be  focused  on  improving  the  power  delay 
performance  of  the  adder.  In  VLSI  implementations,  parallel -prefix  adders  are  known  to  have  the  best  performance. 
Reconfigurable  logic  such  as  Field  Programmable  Gate  Arrays  (FPGAs)  has  been  gaining  in  popularity  in  recent  years 
because  it  offers  improved  performance  in  terms  of  speed  and  power  over  DSP -based  and  microprocessor-based  solutions 
for  many  practical  designs  involving  mobile  DSP  and  telecommunications  applications  and  a significant  reduction  in 
development  time  and  cost  over  Application  Specific  Integrated  Circuit  (ASIC)  designs.  The  power  advantage  is  especially 
important  with  the  growing  popularity  of  mobile  and  portable  electronics,  which  make  extensive  use  of  DSP  functions. 
However,  because  of  the  structure  of  the  configurable  logic  and  routing  resources  in  FPGAs,  parallel -prefix  adders  will 
have  a different  performance.  In  particular,  most  modern  FPGAs  employ  a fast-carry  chain  which  optimizes  the  carry  path 
for  the  simple  Ripple  Carry  Adder  (RCA). 

In  this  paper,  the  practical  issues  involved  in  designing  and  implementing  tree-based  adders  on  FPGA’s.  An 
efficient  testing  strategy  for  evaluating  the  performance  of  these  adders  is  discussed. 

KOGGE-STONE  ADDER 

The  arrangement  of  the  prefix  network  specifies  the  type  of  the  Parallel  Prefix  Adder. The  Kogge-Stone  tree 
achieves  both  log2N  stages  and  fan-out  of  2 at  each  stage.  This  comes  at  the  cost  of  long  wires  that  must  be  routed  between 
stages.  The  tree  also  contains  more  PG  cells;  while  this  may  not  impact  the  area  if  the  adder  layout  is  on  a regular  grid,  it 
will  increase  power  consumption.  Despite  these  cost,  Kogge-Stone  adder  is  generally  used  for  wide  adders  because  it 
shows  the  lowest  delay  among  other  structures. 

The  complete  functioning  of  Kogge  Stone  Adder  can  be  easily  comprehended  by  analyzing  it  in  terms  of  three 
distinct  parts 
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Figure  6:  Steps  in  Parallel  Prefix  Adder 


Pre  Processing 

This  step  involves  computation  of  generate  and  propagate  signals  corresponding  too  each  pair  of  bits  in  A and  B. 
These  signals  are  given  by  the  logic  equations  below: 

Pi  = Ai  xor  Bi 

gi  = Ai  and  Bi 

Carry  Look  Ahead  Network 

This  block  differentiates  Kogge  Stone  Adder  from  other  adders  and  is  the  main  force  behind  its  high  performance. 
This  step  involves  computation  of  carries  corresponding  to  each  bit.  It  uses  group  propagate  and  generate  as  intermediate 
signals  which  are  given  by  the  logic  equations  below: 

Pi:j  = Pi:k+i  and  Pk:j 

Ghj  = Gi:k+i  or  (Pi:k+i  and  Gk:j ) 

Post  Processing 

This  is  the  final  step  and  is  common  to  all  adders  of  this  family  (carry  look  ahead).  It  involves  computation  of  sum 
bits.  Sum  bits  are  computed  by  the  logic  given  below: 

vSi  = pi  xor  Q.1 


IMPLEMENTATION  RESULTS 

The  design  proposed  in  this  paper  has  been  developed  using  Verilog-HDL,  Simulation  Done  by  using  Model  Sim 
6.5  e and  synthesized  in  Xilinx  ISE  9. li. 
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The  Regular  CSLA,  Modified  CSLA  and  Kogge  Stone  Adders  Simulation  and  Synthesis  Wave  forms  show  in 
below  figures.  The  128  bit  Modified  Carry  Select  adder  shown  figure  10  and  13  .It  compare  to  regular  CSLA  modified 
CSLA  architecture  is  take  less  amount  of  delay  it  is  approximately  reduced  by  30%  of  regular  method  design,  But  if  we  go 
for  the  Proposed  Design  (Kogge-Stone  Adder)  Delay  is  Less  Than  50%  of  Modified  CSLA,  Less  Than  65%  of  Regular 
CSLA  and  nearly  85%  less  Delay  as  Compare  Ripple  Carry  Adder,  The  Delay  Results  and  Graphs  are  shown  in  Table  1 
and  Figure  14 


Figure  8:  Regular  128-Bit  CSLA 





Figure  9:  Modified  128-Bit  CSLA 


Figure  10: 128  Bit  Kogge  - Stone  Adder 
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Figure  11: 128  Bit  Regular  CSLA 


Figure  12: 128  Bit  Modified  CSLA 


Figure  13:  128  Bit  Kogge-Stone  Adder 


Table  1:  Comparisons  of  Adders 


Delay 

(Ns) 

RCA 

CSLA 

Regular 

CSLA 

Modified 

Kogge  Stone 

16  Bit 

23.540 

18.420 

16.562 

16.481 

32  Bit 

44.366 

25.536 

21.411 

21.027 

64  Bit 

83.726 

39.588 

25.950 

27.167 

128  Bit 

162.446 

56.375 

40.562 

27.977 
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Figure  14:  Comparing  Graph  for  All  Adders 


CONCLUSIONS 

A simple  modification  is  maid  to  regular  CSLA  to  reduce  the  Delay  and  area  of  CSLA  architecture.  The  reduced 
number  of  gates  of  this  work  offers  the  great  advantage  in  the  reduction  of  area  and  also  the  total  power.  The  compared 
results  show  that  the  modified  CSLA  has  less  delay  and  it  requires  less  area  consumption,  but  this  CSLA  not  suitable  for 
Digital  Signal  Processing  Applications  because  DSP  applications  require  Very  Fast  computations  design.  Therefore 
Kogge-Stone  Parallel  Prefix  Adder  is  proposed,  it  has  nearly  High  Speed  Computation  as  Compared  to  Modified  CSLA. 
The  speed  of  the  proposed  adder  is  almost  double  the  existing  adders. 

In  This  Paper  all  the  Adders  RCA,  CSLA  Regular,  CSLA  Modified,  Kogge-Stone  with  16  Bit,  32  Bit,  64  Bit  and 
128  Bit  Sizes  are  developed. 

With  Respect  of  Results  and  Tables  Finally  it  can  be  concluded  that  For  Higher  Adder  Applications  Proposed 
Adder  is  Best  Choice  because  it  has  500  % high  Speed  compare  CSLA  Designs.  As  the  functional  verification  decides  the 
quality  of  the  silicon,  we  spend  60%  of  the  design  cycle  time  only  for  the  verification/simulation.  In  order  to  avoid  the 
delay  and  meet  the  TTM,  we  use  the  latest  verification  methodologies. 
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